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ABSTRACT 
Criterion-referenced testing (CRT) is defined as a 

method of ascertaining an individual's status with respect to some 
performance standard. Computer-assisted testing (CAT) is a method of 
constructing tests using a variety of computer techniques such as a 
single test computer printouts, stored item banks, teacher specified 
criteria, machine readable answer sheets, etc. After an examination 
of the literature on both subjects, the conclusion reached is that 
CRT and ChT may help each other in the following ways: (1) iten 
generation techniques may be refined to allow more comprehensive 
evalaation of domains by waking more items available; (2) item 
‘sampling algorithas may be used to achieve more representative tests 
from existing domains; (3) branching tests may be utilized to arrive 
at the sost cost-effective method for evaluating performance; (4) 
test models may, be siaulated to ascertain their feasibility; (5) 
mathematical aodels may be developed to help define and standardize 
the criteria by which perfora@ance is judged; and (6) CRT can be used 
more widely as a valid theory to aid the design of CAT systems. There 
is a 16 page annotated bibliography divided into two separate subject 
divisions. (Author/NR) 
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PURPOSE AND ORGANIZATION OF THIS REPORT 


Literavure on both criterionere¢erenced testing 
(CRT) and computereassisted testing (CAT) is abundant, 
Relatively few “researchers, however, have attempted to 
synthesize these two fields, The author examined literature 
from both fields in an effort tot . 

(1) Adentity studies that nave Used CRT models in 
designing CAT systems, 

‘ (2) gain a thorough understanding of the test 
administration and analysis procedures used in those 
studies, and 

(3) discover other facets of CRT models that might 
be realized through CAT techniques, 

TniS Papet reports on the literature search 
condyeted by the author by digcusging representative studies 
in poth fields and noting: additional research efforts in an 
annotated bibliography, The report begins by discussing 
articles on CrT theories and models, These articles provide 
a background for examining the Second set of papers: 
reports on existing CAT systems, Conclusions are drawn 
about the states of the art for both CRT and CAT, and 
comments Made on areas in Which the tWo fields mignt 


complement each Other. 
» 
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CRITERION@REFERENCED TESTING 


Theory 


~ Criterionereferenced testing (CRT) is perhaps the 

most significant development in the eVvaiuation of 
instruction since normereferenced testing (NRT) was 
implemented on a large scale in tne early 1900s, CRT 
Gitgers from NRT in the following ways 

NOrmereferenced measures are those which are uséd to 

ascertain an individuel’s performance in relation to 

the performance of other individuals on the same 

measuring device, . , Criterion-referenced measures 

Care used) to ascertain an individual’s status with 

respect te some criterions is@e, Pertormance 

stendard," (Popham and Husek, 1969) : 
The former are used to make decisions apout individuals; 


the latter, about individuals and treatments, Glaser (1963) 


adds tnat NRT provides "information about the capability of 
a Student compared with the capabilities of otner students", 
while CRT provides “explicit information on what the 
individual can and cannot do", | 

Cox (1971) feels tnat "it 18 Possiple for a single 
test to yield both normereferenced and criterionerefterenced 
information", This posture appears to oppose that held py 
Glaser, who feels that the choice of items ai¢¢erentiates 


test design, Many researchers (Adams, 19743 Cox, 19783 
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Gleser, 19633 Ponham and Husek, 1969) do agree, however, 


thet traditional item analysis information (difftteulty and 


diserimination indices) and test characteristics 


(reliability and validity) have different meanings in CRT 
then they do in NRT, That.is, decisions on the value of a 
given item or the worth of a given test would be difterent 
in the two applications, For example, Cox and Glaser both 
note that NRT items must discriminate between individuals on 
a single test, Therefore, items witn di¢ficulty levels of 
1,90 or discrimination indices of 0,00 are useless in a 
normereferenced test, A criterionereferenced test, however, 
18 designed to make in "generally difficult for those taking 
it before training and generally easy after training" 
(Glaser, 1963), Therefore, items that are useless in NRT 
would be retained in CRT if they are answered correctly 
after training but answered incorrectly peftorers, 1.0.5 if 


they provide pretest/posttest discrimination, 


Models 


The Dichotomous Outcomes Model, Tne ideal CRT i8 
one which yields a single, unambiguoug answer to the 
questions does the learner possess the skill being tested? 
This ideal 18 well described by Adams (1974) as the 
"plenotomous Outcomes Model" (p0M), In this model, a 
learner may be either in the mastery state or the 


nonemastery State, exclusively, On an ideal, valid test 
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item, the learner will aqive a correct response if he/sbe is 


in the mastery state and an incorrect response if he/sne is 
in the nonemastery state, Adams states that an "error of 
testing occurs whenever learner performance on an item does 
not reflect nis tr¥e competence in the trait in question", 
Thus, two types of errors can occur, A Type I error 
(in Adams* scheme {[1)) occurs when the learner is in the 
nonemastery State but gives a correct response on a Valid 
item, A Type II error occurs when the learner is in the 
mastery state but gives an incorrect response on a valid 
item, The goal of the test designer, tnerefore, is to 
minimize the probability of these errors by requiring the 
learner to reSpond to a sufficiently large nUmper of items 
to assure reliability, yet to maximize the cost 
effectiveness of the testing procedure by keeping the number 
of itemS aS Small aS possible, A CAT system that realizes 
these goals has been designed by ferguson (1971) and will be 


discussed later in this paper, 


DomaineReferenced Testing, An important tield that 
is a sibling to CRT 18 DomaineReferenced Testing (DRT), 
Hively (1974a) differentiates the two as follows! 


The world of psychometrics may be geen as a contrast 
between DomaineReferenced Testing and Norme 
Referenced Testing, The distinction ig essentially 
the game ag the one Robert Glaser made between 


ee * 

1, These two types of errors are also desegcibed by Ferguson 
(1971), but tne numbers of tne types are. switcned, That 
is, Adams’ Type I error Corresponds to Ferguseon’s Type 
II, and Adams’ Type I1 corresponds to Ferguson's Type I, 
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NOrmeReferenced Testing and CriterioneReterenced 
Testing, But, the term "criterion" lends itself to 
misinteroretation, It carries surplus associations 
tO Mastery learning that are best avoided by using 
the more general term "domain" instead, Most people 
who taelk about CriterioneReferenced Testing assume 
that the technology of DomainsReferenced Testing 
exists, but they often do not fully recognize what 
that would imply, (paqe 5) 

Hively further clarifies DRT theory with the diagram in 

Figure 1, 

It is this autoor’s opinion that the distinction 
between CRT and DRT is most important when working with the 
cognitive and affective domains, Where the universe of 
target behaviors can indeed be abstract and infinite, In 
tne psychomotor domain, and even in s0me applications in the 
cognitve domain, the universe of target behaviors can 
usually be much more clearly defined and approach a concrete 
domain, theredy minimizing the distinction between CRT and 
DRT tor these behaviors, The proble™ sgeemg to one of the 
preciseness with which the benavioral objective can be 
stated, 

Hively (1974a) and paker (1974) both emphasize the 
importance of transfer in constructing items for inclusion 
in a test domain, Tne goal of the ORT constructor, 
according to Hively, is "to create an extengive pool of 
items that represents, in miniature, the basic charectere 
{stics of Some important part of the orginal universe of 
knowledge, ,» , The basic notions that guide this activity 
are those of generalization, transfer, and subject matter 


structure", 


AKSTRACT UNTVERSE OF 


TARGET BEHAVIORS 


CONCKETE DOMAIN 


OF ITEMS 


\W/ 


Sample of items oreSented 
to a particular student 
on a particular occassion 


Fiqure 1 
Hively’s Nomainereference Testing Model 


(after Hively, 1974b) 
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Mathematical interpretations, Millman (1974) and. 
Ferguson (1971) have both worked to interpret CRT models : 
into mathematical terms, Their .work provides means for 
implementing the DRT and DOM models in real testing 
Situations, 7% 

. Millman (1974) models potential testing situations 
ap a threeedimensional matrix of items, examineds, and 
occassions, Items are performances that aré "Unambiguously 

. scoreable as either correct, incorrect, cr not attempted", 
1,@,, thelr outcomes are dichotomous, Occassions are 
"opsetvations designed to detect the growth or change in 
whiten we are interested", When examinations are scored, the 
percent of items correct is judged against a passing 
standafds bUt allowance is mac for the error of testing py 


computing the "Uncertainty Band" (UB) as follows: 


“ 


: : | Po (1-Pe) 
(1) UB & 2 Ce 
‘ Nef n 


where N ig the number of items in the domain, 
n ig the numver of items in the test, and 
Po is the passing standerd in percent, 
It 18 interesting to note that as the number of items in the 
a 
domain (N) a@PProaches infinity, the term ((Nen)/(Ne1)) 


approaches 1, and Equation (1) then simpli¢ies tor 


‘ PaliePa) 
(2) UB ws 2 ea emmnenineel 
n 


Miliman claims that "when scores fall outside of the 


fon the learner’s 


correct decigiongs 


Uncertainty  pands 


mastery state) are made over 95% of the time". 

Ferguson (1971)- developéd a much more generalized 
mathematical interpretation of tne DOM, His interpretation 
uses two test scores that are each percentages of correct 

responses expressed as decimals, Po and Pyw A learner 1s 
said to have "sutticient proficiency" (mastery) on the skill 
‘being tested 1¢ his/her score is greater than p , and 
: "insutticient proficiency" (non=mastery) {jf tne score ‘1s 
less than py 
Ferguson then identified the two types of errors 
discussed by Adems [2], He defined w as tne provaniiity 
that a Type I error will occur, that -is, (the proebdadility 
tnat - a learner with, sugficient proficiency will me 
incorrectly eissatttes as naving insufficient proficiency by 
the test results, The probability that a Type I] error will 
occur was detined as f, 

Tre test administrator or ‘eveloper could then 
assign values to pg, Py, % and B and determine tne 
learner’s proticiency to any desired dearee of accuracy as 
follows, Atter each item 18 administered, a score, Sy 18 


computed usina the formula: 


: py lop, 
(3) 5S = C#loge= + w+ logeme= 
a) 1-Po 


where c is the number of items answered correct+ 
‘ ly, and : 


ALE. AL ALLL ALICE A 

2. Note once again that Ferguson’s Tyne I error corresponds 
to Adams’ Type Il, and Feraguson’s Type II corresponds to- 
Adams’ Type 1, ; 
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(4) §<¢ fegeee 
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« fe the nueper of items ansvered incore 
rectly, 
If the learner hes sufticient proticiency, 


leat > 


If ene learner nas inevttictent proficiency, 


*. leg « 
($) S$ > logese ; 
3 


‘ 


T¢>neitner tnequelity (4} nor (5) i8 truer Leese If 
a 18 
16] loges= « & ¢ |OoQe== 
lea a“ 
enother test item is adsinistered, e 


bo 
AS an eXeeplé of Fergquton’s BYeneme, conBider an exam 


witne 


a © ,20 

sa ie 
witn tnese values, the Graph in Filoure 2 cam de constructed 
to filustrate now » learner’s test Tesuits "ould pe véed in 
deter®ining proficiency, wote thet the learter’s 
proficiency atate cannot be Classifiea after just one 
response 16 Pade due to the position of the “Uncertainty 
Band" for the values Of pay Dar a, and A choéen, At least 
two {tees eust be answered incorrectly for alearner to be 


classities a8 Ppossesbing insutticlent proficiency, and at 


beast ait "ust pe etewered Correctly for the opposite 


2 


é 
a 
: 
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z 
$ 
: 


(Number of 
items tested 
to date.) 


Figure 2 


Ferquson’s Metnoo for Determining Proticiency 
on @ Criterton-Reterenced Test 


(Ferguson, 1971, p, 3) 
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Classification to be made, By changing the values of tne 


variscoles, the position of the Uncertainty Band may 0»be 
altered, The implementation by Ferguson of these 


matnematical scoring aloorithms -{nto a s0pnisticated CAT 


syste™ is discussed later in tnis repart, 
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COMPUTER®ASSISTED TESTING 


Introduction 


Computereassisted testing (CAT) {is one of , the 
fastest groving applications of instruetional computing, 
constructing tests by computer is a relatively 
straigntforward Process and can ve shown to be 
costeetfective (Ansfield, 1973) menne and Lust¢raat, 1974) 
Prosser, 1975), Lippey (1973) enumerates the major benefits 
of CAT as follows; bi 

(1) Teduces clerical chores requifed of an 
instructor, 

(2) provides errorefree text, 

(3) allows the educator to concentrate on content 
rather that the mecnanical aspects of test construction, 

(4) eliminetes the problem of securing test items 
from premature release if the item bank is su¢tictently 
large, and 

($) centralized collection of items allows input 
trom many users, thus improving the Quality of the items 
througn experience, 

A large variety of CAT systems are currently in use, 


from tnose tnat store only item characteristics (ETS, 1974) 
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to those that construct and administer tests through an 
interactive terminal (Ferguson, 1971), The systems 
discussed “tn this report are grouped into four major 
categories by their apparent level o¢ sophistication, 
Systems at the first three levels gQeneraily employ batch 
processing cnd include, respectively, systems that store and 
Print teacnereconstructed exam&, ‘those tnat automatically 
construct exams from a given item banks, and those that 
employ an algoritnmic approach to item construction, The 
fourth level {8 characterized py interactive systems that 
make use Of branching tests to control the sequence in which 


items are presented to the student, 


Test Printing Systems = 


Tne simplest type of CAT gygtem ig one which does 
the job of @ secretary by printing test questions selected 
bY an instructor (Remondini,s 1973), The items to pe printed 
may be stored in any machine readable format, @.9e, Magnetic 
tape, disk, or punched cards, In pRemondini’s system, the 
computer produces qa Single copy of tne test, This is 
photocopied and transferred onto ditto Masters for 
duplication, The answer sneets are corrected bY a mark 
sense device and the computer is then used to produce an 
item analysis and Update the statistical data for each item 
on punched cards, 


Salisnjack (1973) uses a system almost identical to 
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Remondini’s, He claims that it only takes 25 minutes to 
prepare two forms of a 75eitem multiple choice test with the 
aid. of tne computer, Salisnjack finds tnat nis CAT system 
controls the cost of test construction, solves’ the problem 
of cheating, and reduces the "edge" provided by fraternity” 
test tiles, He comments, however, that "attempts at making 
the complete data bank aVailable to all students as a study 
quide so far have been unsuccessfule*the cost of providing 
individual copies is too high, and copies placed in the 
liprary tend to disappear", 

' MENTREX Enterprises\in Los Angeles is a commercial 
company that provides test construction services similar to 
those offered by the systens Of Retondini and Salisnjack 
(Lipaw, 1973). Users request tests through the mail by 


selecting questions trom a "catalog" gupplied by the 


company, The syStem can produce several férms of the same 


test by “scrambling” the.items or selecg-items for the test 
based on "keys" specified by the user, Test masters are 
returned ready tor duplication, along with an answer key and 
machine readable answer sheets, Answer sheets are later 
returned to MENTREX for item analysis, 

Educational Testing Service (1974) is a unique user 
of CAT due to the sheer size of their operation, They have 
stated that there are two tasks that are necessary before 
they can implement large scale CAT USe, and they do not yet 
gee these tasks as part of the current state of the CAT art, 


These tasks are! 


ee Oe 


Page io 
_ Gi) "the development of detailed item 
classitication systems", and 

(2) "delineation: of the professional judgements 

made in building a test fron @ group of itmes in detailed 

content, ability, and statistical specifications in terns 
precise enough to be translated into computer programs", 

ETS currently uses a CAT systen to help select items from 

their huge. data banks, The System does not print tests, but 

simply returns item numbers ‘that fit specified 


\ 


characteristics, Their computer records ‘on each item 
includes * 
(1) the dtem 1D nunber, 

(2) its classification, 

(3) a history of its use, 

(4) up to five sets of Statistics, 

(5) codes for security. level and current activity, 
and 

(6) twelve iSecharacter key¥ords, 
It is interesting to note that ETS sees the demand for large 
national selection tests as diminishing, They feel that 
interactive testing 18 requized for the future, witn tests 
for guidance, placement, and evaluation, Their paper states 
that the technology \for such systems exists, but that 


development funds are heeded to make them costeeffective, 


rn 
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Test Construction Systems 
Using Item Banks 


The second level of CAT ig characterized by systems 
that construct tests from stored item banks, In addition to 
the benefits noted earlier, these systems provide a means 
for generating multiple forms of the same test, Jensen 
(1973) has USed such a System to generate 4000 different 
forms for a Class of 1500 students, He achieves 
criterilonereferencing by allowing students to take a test on 
a specific topic as often as they like and counts only tne 
nignest grade, His philosophy in this approach is that 
",.,one should ask only what one wishes the. student to know, 
but aSk it in 80 many different Ways that the student cannot 
learn the items without learning the concept", , 

Prosser (1973) describes a similar test construction 
system put ‘includes Some figlres on its cost, This system 
selects items from predefined "Groups" that are specified by 
the user, To Produce 1909 3Jepage tests, tne system requires 
20 seconds of CPU this and three hours of printer time, 
making the cost of each form about five cents, 

The Classroom Teacher Support system (CTSS) was 
designed by IBM for the Los Angeles Unified School District 
(Toggenburger, 1973), This system constructs multiple 
choice exams according to teacher specified criteria such as 
course, category, difficulty levels behaviorel level, and 
keywords, The system can also work with "macro" items, 


1,@,, stories or documents followed by two to nine related 
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questions, Toggenburger reports thet CTSS currently uses an 
American History item pank of 8004 items tnat were written 
by 20 teachers over the period of one summer, 

. Anstield (1973) has developed a system similar to 
CTSS called the Automatic Examination Generator (AEG), 
Ansftield’s report on AEG includes data on cost} the total 
computer expense for producing four versions of a 70eitem 
objective test with answer keys ig g1,75, 

One last item banking SyStem With a Somewnat unique 
character is one developed by Conen and Cohen (1973), The 
main purpose of tnis CAT system 1g to assure no overlap in 
the items presented on successive administrations of a test 
for any one student, Cohen and Cohen have developed two 
versions of this system, one for batch processing in COBOL, 


and one for interactive processing in FORTRAN, 


Algorithmic Approaches to Item Construction 


Olympia (1975) contends that standard item banking 
nas tnree disadvantages! 

(1) dt lacks repeatability lunless the item bank is 
extremely krarsce}, especially when a given item always 
aDpears in a test exactly as it is stored, ; 

(2) it requires a large amount of construction time 
and storage to create a usable bank, and 

(3) it discourages the Snaring of one program by 


various disciplines (3), 


rah 
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To overcome these drawbacks, Olympia deViSed a system tor 


storing examination {tems in three "pools": a keyphrase 


Pools a Statementpnhrase pools and a diStractor pool, The 
system constructs an item hy joining one member of tne 
kevyonrase pool with one memner ot tne statementpnrase pool 
and tnen selectingo a list of answers (including the correct 
anSwer) from tne distractor pool, AS an eXample, three 
pools for constructiona {tems dealing «sith electron 


configurations are shown in Table 1, 


Table 1 
Pools for Constructing Items on Electron Confiqurations 


(atter Olympia, 1975) 


Keyonrase Statementphrase Nistractor 
Pool Pool Pool 


chlorine has how many valence electrons? 
Oxyqen has how many Leshell electrons? 
hydrogen needs how many more electrons” 
Maanesium {n order to Nave an Inert 
Helium gas structure? 


SITU Swns 


Denney (1973) describes a system similar to 
Olynoia’S. ThiS SYStem Stores a multivle choice avestion as 
a stem with up to seven distracttors, witn this data, the 
Ss ondeenenntneeeeenietaemeneeneeememiemedll 
3. This author feels that the example systems discussed in 


tne previous two categories demonstrate capabilities 
wnicn clearly contradict Olympia’s tnird opjection, 
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computer can construct 245 difterent questions consisting of 
a correct choice and tour distractors, I¢ the order of the 
five alternatives is randomized, up to 29,400 ‘ditterent 
variations of the same question can be generated, 

Heines (1974) created an interactive CAT system that 
randomly generates data to complete item forms or selects 
one of four previously defined item Varitations, Regardless 
of the item generation scheme, the system assured that nc 
student would be presented with the same item on successive 
administrations of the test, This system is also 
interesting in that it was introduced by instructions. on 
aldio cassette, tied to diagrams presented vie a slide 
projector under student control, and designed to provide and 
interactive environment for the instructor as well as the 


student, 


Interactive, Branching Tests 


Ferguson (1971) defines a pranching test as “any 
instrument designed to measure a get Of skills or objectives 
by routing the examinee to items neitner too easy nor too 
difficult for nim to solve", A gimple example of this 
technique was developed by Hansen (1969) and is shown in 
Figure 3, In tnis schene, Item 1 is presented to the 
student and he/Sne is then branched to Item 2 if Item 1 {8 
answered correctly and Item 6 if it is answered incorrectly, 


Item 1 is designed to have a difficulty index of ,50, and 
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Figure 3 
Hansen’s sequential Item Tree Network 


€Hansen, 1969+ p, 212) 
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each successive item is desiqned to have a difficulty 


ditterential of +,1@ from the preceding item, Thus, the, 


most difficult item in the tree (Item 4) will nave a 
dtteieulty index of .84, and the easiest item (Item 13) will 
have an index of ,28, Hansen found thet this scnewe is 
signiticantiy more reliable than tne traditional classroon 
test and {s ettective at reducing test anxiety, 

The Otic aspect of Ferguson’s work 
(1971) nas already been discussed at length, By comparing 
Ferguson’s work to that of the other CAT researchers 
discussed so far, it can be seen that Ferguson is one of the 
only researchers to have created a CAT system as a means for 
implementing 4 dei l-devétoped theory of evaluation, This 
system tested onjectives in the IPI (individually Prescribed 
Instruction) “iatnematics curriculum, a program tat alreaay 
made use ot comprehensive papereandepencil testing and 
therefore provided a USefUl measure of the system’s success, 
Ferguson administered tests that utilized nis item sampling 
and evaluation techniques (discussed previously) and then 
branched students to test items on either more advanced of 
preliminary objectives based on the results, sy tnis 
process, Ferguson was able to pinpoint a  student’s 
competency level witn any desired accuracy and then 


prescribe instruction to fit the student’s needs, Ferguson 


found that his pranchina CAT syste® yielded classification 


decisions that were "consistent with subsequent. 


papereand-rencil test outcomes approximately 99% of the 


time", He conjectured that “by employing an item sampling 
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technique that permits control over classification errors, 
tne CAT model may increase reliability", 

Ferguson discussed three "suggested retinements® to 
niso sodel, First, ne tele thet testing Rust be 
representative and that this was not “always guaranteed by 
rendom® sampling, He theretore recommended a coOmpination of 
randomly constructed items with domainereterenced ites 
tores, Second,. Ferguson telt that research is needed to 
achieve a compromise between miniaizing Type I! errors, 
which ne considers the rore serious (these occur when tne 
exaninee-is @ Nonewaster out the test results indicate: 
mastery), and reducing the numper of itess presented (tor 
expediency) by aliewane the @rror peremeters to increese, 
third, ne noted that all examinees started at the sare 
pointy and thetefore nianly competent examinees aid Propleas 
that wer@ too easy while incompetent exatinees did ones that 
were too nard, He suogests that exetinees mignt be ablowed 
to cnoose thelr own starting points, Ferguson concludes, 
"py tailoring tne test to individuals, tewer objectives need 
to pe tested and the opjectives that are tested are less 


subject to errors of protictency classification", 
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CONCLUSIONS 


Tne fleids of ecriterion=reterenced and cosputere 
assisted testing are still fn their infancies, The 
literature exasined for tnis report shows that consideranle 
ditterences of opinion exist on tne meaning and uses of CRT, 
and tnat very tew CAT systees are pased in sound theories oft 
evaluation, O*’Rellily. Gortn, and Pinsky (1973) comment on 
the current state of tne CAT art a8 follows: 

{Current Cat etforts) tend to be laraely 
sioerticials poorly grounded in relevant evaluetion 
eodels and test theory and tend to continue 4 
questionadile school and classroom prectice. . . 
iTney] tocus on tne secnanics of test production vie 
sacnine, @ tendency whicn works against the need to 
paintein precise relationsnio vetween the intent of 


instruction and tne seesuresent process. (page 34) 


Tnis autnmor feels tnat CHT and CAT sey help each 


er efure i several way,s! 

Item generatinn techniques sey pe retined to 
alire ore *orenensive evaluetion nt dosains by mekina 
snore ire*s evailerie 

ites savepling alarrithss say ne used to anieve 


sore represertative tests from existing domeinsg, 
’ branching test @ay re utlilzed ro arrive at tne 
enar cr steettertive retn d tor evaluating nerforeence, 


4) Test fodels way pe sisulated to ascertain their 


2 


be 6 


Page 27 
xs 
teasinoility, 

(5) Matheratical models may be developed to nhnelp 
define and standarcize the criteria by which performance is 
fudged > 

(6) CRT cen be used more widely ag a valid theory 
to ald the design of CAT systems, 

At present, CRT 18 lacking in cemonstrateavle,” 
practical applications, while CaT is lacking in souné 
instructional theory, Researchers who synrhesize the best 
characteristics of these two ¢felds may find tnat they 


cotplement each other smoothly end can concripute heavily to 


each otner’s development, 


‘Page 28 


ANNOTATED BIBLEOGRAPHY 


Introduction 


This bibliography is broken down into two sections, 
CRT and CAT, ‘The only pacers tnat. really fall into potn 
cateqories are those by Fergusone and these are categorized 
under CRT, within each category, all papers are listed in 


alPnavetical order, 


SECTION At CriterionePeterenced Instruction 


1, Adams, E,N, On scoring a mastery Yearning control 


test. Journal of Computer=Based jnstruction $(2)3 


$4058, Novemmer 1974, 


Detines tne Ofjenotomous OUtcomes Model 4nq its 
implications for CRT, Difterentfiateg tne two types of 
errors of testing that can occur and explains their - 
relationships (gee also Ferguson on errors of 
testing.) Phi losopny of test design value to be 
maximized ig cogt effectivenegs,s valye to be minimized 
ig “regret” (cost of cClaggitying a Master as a 
nonemaster and vice versa), 


2, Baker, Eve L, Beyond objectives domainereterented 


tests for eveluation and instructional improvement, 


Egucational Technology 14(6)110°16, Juner 1974, 


Arques that objectives consist of suUbstence and 
form, the former detining "the content to wnicn tne! 
learner {[s to respond" and tne letter how the learner 
\s to display wheat he/she learned, Claims that 
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“overemphagizing eitner , , , may inhibit tne improves 
ment of instructional practice", Feels that “most 
objectives do not present sugficient cues fegarding 
wnat a teacher should alter in instruction to 
facilitate improved learning", but "DrT can supply botn 
the date needed for assessment of instructional 
programs and information suitable for feedback to 
teachers to facilitate planning", Key is transfer,” 

States that domains snould be prepared with the 
following considerations! 

(1)- Domain descriptions "a general, but 
Operational, statement of the behavior and content upon 
which the test. focuses”, 

(2) Content limits: "a set of rules of content 
eligible tor inclusion in the test items or in 
instruction", 

(3) Criteria tor constructed responses! "rules 
by which the adequacy of responses to the item can ve 
judged", 

(4) Distractor domain: "specifies the rules tor 
inclusion of wrongeanswer alternatives”, 

(5) Formats "a description of the form in whieh 
tne items Will be presented to students", 

(6) Directions: “ag ftascimile of direction 
provided the learner in the-test sjtuation", 

(7) Sample items. "intended as a representative 
of the class of responses desired", 


Cox, Ricpard Cc, Evaluative aspects of criterione 


referenced measures. In Popnams Wed, Cede)» Criterione 


Referenced Measurement, DP, 67975, Educational 


Technology Publications, Englewood Clitts, N,J,, 1971, 


DIiSCUSBeS USesS Of Teliadi{lity, Validity, and {tem 
analysis data in Crt, Cites two methods of item 
analysis: (1) upper and lower thirds (traditional 
method), and (2) percent passing: on posttest. minus 
percent passing of pretest, Also discusses the 
Sequentially scaled achievement test, where a pupil 
answers 811 questions up to a certain point correctly 
(hissner level of achievement), and misses all items 
beyond that point, (This is the "Ldeal" test descriped 
by Popham and Husek,) 


Ferguson, Richard L, Computereassisted  critertone 


Teterenced testing, Learning Researeh and Development 


5. 
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Center, University of pittsburgh, working paper 49, 


March, 1978, 


Background peper for Ferguson’s later work (1971), 
Describes nis item generation, brancning, and test 
scoring algorithms, and provides data on @ comparison 
of the computervassisted test with the corresponding 
papereand=pencil.tests, Branching technique involves 
comparison of the Percentage o¢ items answered 
correctly on a given objective (p) with the passing 
criterion for that objective (po), If the objective is 
Mastered and p 2 (i = .5#po), the student is branched 
to the most difficult Untested objective i; the 


sequence, If the objective is mastered’) but 


p< (1 © .,5#pp9), tne student is branched to # more 
difficult objective midway between those not already 
tested, lf the objective tg not mastered, a similar 
procedure is used to branch the student to an e'isier 
onjective, Ferguson reports on a simulation compxsring 
this branching tecnnique to two others: °(1) brantning 
up one objective in the sequence if the objective 
currently being tested wag Mastered and down two 
Objectives if it was nots, and (2) branching up two and 
down one (these tecnniques @re similar to Hansen’s), 
Results snowed tnat Ferguson’s& bran¢ning tecnnique 
required tewer test items than the otner two in almost 
all cases, 


Ferguson, Richard Le Computer assistance for 
individualized measurement, Learning Resource and 
Development Center, University otf Pittspurgh, Morens 
i971, 

A more comprehensive discussion of the 1978 work, 
describing all aspects of Ferguson’s test model and its 
use in the JPI mathematics curriculum, Presents prior 
regearch on branching tests and full'capabilites of the 
current CAT system, (Specific aspects of tnis work are 
described in detail in tne body of this report,) 
Garvin, Alfred p, The applicability of criterions 
referenced measurement by content area and level, In 


Popham, W, James (ed,), CriterioneReferenced Measures 


ment, pp, 55°63, Educational Technology Puplications, 
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Englewood Clitts, NJ, 1971, 
/ 


/ 

A Slightly numorous View of CRT, proposing that 
some subjects, e@,94, English, need not nave criterion 
levels that everyone must master, proposes the 
following "general/ principles” on the applicability of 
criterionereterenced measurement (CRM) to various 
content areast 

(1) “Unless at least one of the- instructional 

‘Objectives of a wynit envisions a task that myst 
subsequently be performed at a specified level ot 
competence' in at lease some situation, CRM is 
frrelevant because there is no criterion," 
(2) "It puplic safetye economic: responsipility, 
or otner etnical ConSiderations demand tnat certain 
tasks be performed only by those “qualified’ for them 
by formal instruction, them CrM of the outcomes of such : 
instruction is clearly indicated," 

(3) "In any instructional sequence where tne 
content 18 innerentiy cumulative and tne rigor is 
progressively greater, CRM should be used to control 


- entry to successive units,” 


(4) "There are certain content areas to which 


‘eriteria do apply but not everyone need meat them," 


Glaser, Robert, Instructional technology and tne 


Measurement of learning Outcomes! , some Questions, 


American Psychologist 1825199S2l, 1963, 


Detines normereferenced and criterionereferenced 
measures and the uges of achievement measures in 
General, Contends that the diference between the two. 
types of measures is determined by the selection of 
items and discusses the implications of this contention 
on the interpretation of observed. discrimination 
indices, \ 


Lye 


ze 


Glaser, kobert, A eriterionereferenced test, In 
Popham, Ww, James (ed,), Criterionereferenced 
Measurement, pp, 4151, Educational Technology 


Publications, Englewood Clifts, N,J, -1971, 


An extremely detailed discussion. of the 


‘chaTacteristics of CRT anda its difterences trom NRT, 


Contends that -*the distinction is found by examining 
(a) the purpose for which the test was constructed, 
(>) the manner in which it was congtructeds (e) the 
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speciticity of the intormation yielded about the comain 

of instructionelly . relevant tasks, (d) the 
generalizability of test performance information to the 
domain, and (e) the use to be made of the obtained test 
information", Also includes a fascinating reprint from 
Edward L, Thorndike’sS classic work Educational 
Psychology (1913) which shows that the problem of 


_ establisning criteria against whieh to measure student 


achievement is indeed a basic one in instructional 
theory, 


Hambleton, Ronald Ky, and William P, Gorth, Criterion« 
referenced testings issues pand “applications, 
University of MassacnUsetts S¢nool of Educations 
Amnerst, Septemper, i971, 


Defines reliability, validity, and item analysis 
and ‘discusses the use of each in criterlonereferenced 
measures, Describes the uses of test results for 
(1) individual assessment, (2) teaching material 
Q@ssessment, and (3) evaluative material assessment 
(implicitly), Presents descriptions of two criterione 
referenced measurement systems, One of which is a CAT 
system, Contains comprenenSive bibliography, 


Hively, wells, Introduction to domainereferenced 


testing, Educational Tecnnology 14(6)85<10, June 
1974a, 


Defines DRT sid contrasts it with CRT, Defines 
reliability and validity in DRT as the “eccuracy with 
wnich one can estimate the probabilities of correct 
performance within .a concrete domain" and the "sucess 
Of generalization from performance on a concrete somain 
to performance in the larger. universe of knowledge from 
wnich the domain was generated", respectively, 
Diftterentiates NRT and ORT and points Out that both are 
UsefUl in different applications, States tnat item 
analysis as used in NRT does not consider the validity 
of items, and suggests that items on normereterenced 
tests "May be selected for their ease Of administration 
rather than for their formal correspondence to the 
original universe", 


Hively, Wells, Some comments on this issue, 


12, 


13. 


Page 33 


Edycational Technology 14(6)160°64, June 1974b, 


Comments on all articles in this special issue ot 
Educational Technology magazine on Drts. clarifying some 
points and contesting others. Provides a clear 
understanding of DRT theory and an interesting 
aiscUssion of many facets of tnis work, 


Lindvall, C,M., and Anthony J, Nitko, Criterione 
Teferenced testing and the individualization of 
instruction, Learning Research and Development Center, 
University of pittsburgh, paper presented at the 
annual meeting of the National Council] on Measurement 
in Education, Fepruary, 1969, 

Excellent discussion of the differences between 


Crt and AWprTt with the characteristics of Crt clearly 
presented, concise and easy te understand, 


Millmane Jason, Sampling plans for domainereterenced 


tests, Educational Technology 14(6)817e21, June 1974, 


Presents &@ mathematical model similar to Fergusons 
{ni whieh an “uncertainty Band" is computed and used to 
judge tne reliability of a ORT, This Up is a function 
of the numper of items in the domain, the number of 
items in the test, and the passing standard in percent, 
Generalizes the computation. for situations in which 
subtests are used, Discusses the purposes of comparing 
scores on two or more DRT’s and enumerates sampling 


Considerations for constructiong DRT*’s, 


Popham, W, James, Indices of adequacy for crierione 
reterenced test items, In Popham, W, James (edo), 
eriterioncheterenced Measurement, Educational 
Technology Publications, EnGlewood Cliffs, Nudes 
Pp, 79°98, 1971, 

A complex discussion of various techniques tor 


assessing statistical characteristics for CRT- using 
the SWRL and PROBE projects (Southwest . Regional 
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Laboratory and UCLA, reSpectively) as examples, Hianly 
technical, showing. the results of using different 
statistical techniques to analyze the same sets of 


‘data, 


Popnam, WwW, James, and T.R, Husek. Implications ot 
criterion-referenced measurement, Journal’ ot 
Educational Measurement 6(1)3:1*9, 1969, 


Describes the differences between CRT and NRT and 
contends that the traditiona) methods tor computing the 
reliability and validity of a test are not appropriate 
tor CRT because these measures are based on variability 
of test scores, Presents a similar argument against 
traditional item analysis techniques, but admits that 
fas data-processing becomes increasingly automated and 
less expensive, such analyses would seem warranted inn 
situations where the etfort is not immense", Detines 
an ideal CRT as one which has 4 one-torone correlation 
between score and response patternr Le@ee each score 
may only be achieved in one way, (A means for 
realizing this type of test described by Cox,) 
Recognizes the more typical type of CRT a8 a DRT, 


SECTION Bz: ComputereAssisted Testing 


16, 


tT 


Ansfield,s Paul J. A uSer viiented computing procedure 
for compiling and generating examinations, Educational 
Tecnnology 13(3):12°13- March 1973, 


Description of a system {n use at tne University 
of wisconsin, 36V/4¥-pased, using files on maqtape, 
Items may be multiple choice, true/false, or "macro", 
Instructor input: exam title and date, specific or. 
random item selection, specitication of instruction 
sets to be useds- and numper of arrangements for 
multiple forms, Banks currently avallable in 
psychology and sociology, with business, biology, and 
physics planned, 


Baker, Frank 8, An interactive aprroach to test 


construction, Educational Technojogy 13(3)333-15, 


{ 
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Marcn 1973, 


This system allows interactive exploration of 

t items at & computer terminal by searching its data bank o. 

for keywords that match the user’s input, For eech 

item, the system stores: tne item itself, its 10, a 

: ‘Bet of KeyWords, @ code indicating: its most recent 

: usage, item analysis results, the total numper of tines 

y " that, the item neg been used, and a link to the previous 

version of the item, Items are screened by the 

parameters supplied by the interactive user and: @ table 

is Generated containing the numper of itemc requested 

per erea, the numper tound per area, and the predicted 

test means reliability. and varience, Maintenance and 

analysis functions are performed in batch mode from 
card input, : 


ee ee ee OF 


18, Brown, Williard A, Improvement of testing and course 


* evaluation, Journal o¢ Reseatch in Selence Teaching . 
512409243, 196791968, : 


Description of a system designes to help detect 
(1) trivial distractorss (2) erroneous answers supplied - 
by the instructors (3) incongigtent answers between two 
forms, (4) answers with no logical distinction, (8) bad 
question stems, (6) erucial migapellings, and 
(7) trivial questions, System gcores and sorts 
markesense anSwer cards, computes norm Statistics, and 
performs an item analysis on up to four multiple torms 
of @ single test, 


Brown, Willfard A; A computer examination compositor 


for the IBM 360/40, western wasnington state College, 


s @ 


1972, 


Description of a system that perforss the 
following services; (1) stores questions on disk in 
compact format, (2) outputs card images tor the 
compressed files, (3) allows updates to question files 
in baten mode, (4) produces a catalog of questions from 
tne disk file, (5) produces page files-of composed 
exams for output in upper and lower -case “on the I8m 
2741 communications terminal, (G) produces similar page 
files of exam answers, (7) for®ats output or allows 
this feature to be overridden, and (8) provides for 
Multiple testing tecnniques, ; 
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Buckley*Sharp, M.D; A multiple choice question banking 


system, tducational Tecnnology 13(3)ti6e18, March 
1973, 


| An IKM 360/65*hesed batch System used in medical 

Colleaes in the United Kinadom-for filing and printing 
Mulciple choice questions, Questions are banked after 
they are used and validated, allowing other instructors 
to access them, Advantages cited are saving of 
instructor time and encouragement of open release oft 
evaluative materials to students, Contends that tne 
latter yields better, more directed learning, 


Conen, ‘Perrin S., and Leile r. Conen, Computer 


generated tests for a student paced course, 


Educational Technology 13(3):168°19, March 1973, 


‘ Talis system is destanéd to prevent overlap in the 
st items cnosen tor successive administrations of 4 
randomly Generated test for any one student, The exams 


“may be aqenerated in batch mode by CORUL programs or 


interactive mode by FORTRAN programs, System allows 
@ultiple choice, true/false, "identify and define", and 
"graph" questions, The authors see tne advantages of 
their system as rapid exam generation and elimination 
Of biases due to ordering of questions ‘since each 
question is randomly generated, 


Denney, Cecil. There 18 more toa test pool than data 


collection, Educational Technology 13(3)t19628, March 
1973. 


(An APLepased System that combines data banking anda 
aigoritnmic approaches to test Construction, A 
completely interactive systems that includes means cor 
retrieving and editing objectives, activities, and 
resources, A tutorial CAI program is available that 
Quides teachers through learning the system’s use, 
Output includes test copy, student response sheet} 
answer key, and diaqnostic information for both tne 
student and teacher, 

Concludes with the following comments on 
inplementation of CAT: 

(1) ".eeany innovation in education must allow a 
teacher to begin at nis own level of professioRal skill 
and grow into its application as his skills improve,* 
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(2) Quality is of the utmost importance, 

(3) "“,,.eno matter how great the system may seem 
to the originators and no matter how enthusgiestically 
they are able to describe it, its ultimete success will 
be determined on the basis of whether the apparent 
value received is greater than the perceived effort of 
using it, Technology in education must be @ serving 
tool, not an end in itselt," 


Dudley, Thomas J, How the Computer assists in pacing 


and testing students’ progress, Educational Tecnnolagy 
13(3)121022- Maren 1973, : 


A test banking and statistics storing system that 
offers only upper case Output and no graphics, put 
allows questions to be categorized by objectives up to 
tnree levels, Test is printed by the computer and the 
students’ responses are entered through keypunchning, 
Seltspacing aspsect is achieved through individuals 
fzation in a linear, modular program chat requires that 
tne student must pass one tegt before proceeding to tne 
next unit, “—_ 


Educational Testing Service, Copputercassisted 
assembly of tests at ETS, A paper presented at a 
conterence on computereassisted test construction, San 
Diego, California, Octobery 1974, 


A description of a system in use at ETS to store 
item characteristics, Primary output is @ list of {ts 
numperge and the items are retrieved manually, Ordered, 
typed, and printed, A prototype system after thet of 
Willard Brown is peing experimented with for storage 
and retrieval of whole items, Cuurent proplems with 
this system is tnat it has limited graphics end iter 
tormats, and the nignhespeed printeouts ere not of 
acceptable quality for reproduction, This paper also 
includes cticeria which ETS sees a8 necessary for a CAT 
system that they can use to implement large scale 
computereassisted test construction, and a statenent 
tnat interactive testing will pe the way of the tuture, 


Hansen, Duncan H, An investiqation of computerebased 


science teaching, In Ricnard C, Atkinson and H,A, 
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wilson (ede), ComputereAssisted Instruction: A Book ot 
Readings, Academic Press, New Yorks NeYer PP. 209°226, 
1969, 


Descrices a sifple branching test technique called 
tne item tree network and discusses four ways oft’ 
scoring tests based on tnis structure, Found no 
Significant aifterence in administration time between a 
i7eitem CAT and a 2A-iren conventLonad: test, All four 
scoring schemes yielded similer results, and each 
yielded a  reliebislity coetficient ” tnat was 
significantly ~“niaqner than a comparable classroom test, 
Hypotnesized that tnis ingreased feliabliity mignt be 
due to increased dispersion at the upper and lower ends 
oft the scale, Found tnat student attitude towards the 
CAT system was positive, and therefore suggested that 
the system was feasible tor reducing ‘test anxiety. 


Hazlett, C.B, MEDSIRCH: multiple choice test items, 
educational Technologys 13(3)124-26, March 1973, 


Informative, detailed, stepeny-step description of 
how this auestion retrieval system works, Uses 57 
descriptcrs including suniect areas and statistics, A 
FORTRAN@based batch system with puncned card entry but 
many utility prograss aveilable, paper includes a 
tlowchart of the programs’s operstion, User can supply 
question ID’s or a4 “protile™ (list of descriptors 
desired or not dosired), Used in medical colleges in 
Alverta and other Canadian sites, 


Heines,s Jesse 4M, An Interactive, computeremanaged 
model for the evaluation of audlostutorial instruction, 
Unpublished Master’s Thesis, College of Education, 
University of Maine, Orono, Maine, May, 1974, 


Analysis of the use of a BASIC lanquage CAT system 
to evaluate students in an audloetutorial course in 
physical science for nonescierce sajors, Provides 
completely interactive environments for the instructor 
as well as the students, Analysis includes data on the 
systee’s cost, the time required by students (5 master 
its use, and the effectiveness of its test itees in 
assessing student learning, Appendices include 
transcripts of actual sessfons at the terminal and 
listings of the programs used, 


“Witte, 


3u 
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Hour Tse-Cni, and Martnene Carlson, Test construction 
aspects of the computer assisted testing sodel, . 


Egucetional zegnneisay 13(3)826°27, Maren 1973, 


An interactive syste® for the DEcsysten-10 written, 
in FQRTRAN, Item ¢orms generated’ for tne IPE natn 
curriculum. siniiar to those Genereted by Ferguson, 
Statistics are Generated for the item forms, not tor. 
the individual items, : 


° 


Jensen, Doneld 0, Toward efficient, eftective, and 
humane instruction in large classesi student scheduled 
involVement in filmsy  discUssions and computer 
generated repeatable exams, £ onal _Teehnole 


13(39128029 March 1973, ; an 
Report ‘on the use of a cat systes to generate ‘a 
large numver of forms’ for the sese test ine large 
enroliment (1500 students) course, ‘ tne course is 
gelt-paced within @ week*s tise, 1,¢., ore unit Bust oe 


completed each week, Tests are Given every one or two . 


weexs, Since CAT ana the opportunity to retake.an exam 
have been isplemented, student enrollment has doubled 
in three years, the wodel grade has increased to an 
“h*, and the taflure rate hes dropped to less than 10%, 
Recommends a8 i# to 1 ratio in tne item pank size to 
tte qumper of items to be ineluded on any one test, 


.Lioaw, Frieda B, Constructing tests with Pl HENTREX 


, tutorial testing systen, tqucational Tecnnology 


L3C3})t3he31, Karen 1973, 


Description of a commercialiy eveiijable test 
construction service trom M#ENTREX Enterprises in Los 
Angeles, The syste" can produce scrasbied tests, sort 
and select {tems On keyss and sort and select itees on 
@ twoedimengional matrix of key, The syseem currently 
Nandleg only muletpie cholee irems, Usets request 
tests through the sali from a’ "catalog" of test 
questions, MENTREX returns an easeepled test “on ditto - 
sasters or ready tor oftset printings. en, answer key» 
and acnine readevle answer sneets, The angwer sheets 
are Teturned to METREX for icem analygis, Icees can be 
augtented by text or graphi¢s keyed to the question, 


AU 


di. 


4%, 


Page 40 


Liopey, Gereld, The computer cen support test 
Construction in « variety of ways,  f£ducetjone) 
Teennology 130d) st@rl2, Sarcn 1973, 


IMtroductory article to this speciel issue of 
tducatione) Tecnnology segezsine on CaT, Oescrivnes the 
veriety of ways in whiten CAT nes deen iepleeented end 
Supeaerizet the penegite tnet CAT o¢gers, Cleins tnet 
Geveleopeent of CAT systems ig stieulerced by Cclessroon 
teecners rather then professionel innovetors and that 
tne activity is seldoe supported by special tundcing, 
and contends. therefore, that CAT sacisties educators’ 
needs, is tinenectelly eeetible (nes e nigh 
vValueetoocost ratio), end etnet CAT applications will 
Qgrov, 


Menne, John &,, end Paul Lustoreef, Coaputereessisted 
test assesniy at love State University, Paper 
prescnted at @ conterence on compurereessisted test 
construction, San Olego, Celiforniea, Octover 1974, 


PL/lepesed systes tnet evrrentiy etoree 13,000 
itees on a dedicated Giex peck ysing e4 aillion 
tnerecters of atorece, E£ech itee requires aepout 8e6 
cnerect@re of gtorege inciuding epout 174 characters 
for Clessifiers en@ ugece sretistics, Gyetes design 
considerations weret 

(1) tmet eech ingtructor @ust be edle to uge 
Aie/her own item indesing ecnese, 

(2) whet cot ®ust be siniaized, dicraring that 
the syste® *ust De Cperadle py Clerical personnel, and 

(3) thet ehe sysree @Quge @liow ger the inclusion 
eof itee etetistices, 

Mest funceione ere done overnianhe at @ cost of one 
cent per ites generated with 3% sinutes of clerica) 
tiee required te aet up the progres run, Currentiy @ 
betcn syste, end thus tne cost of clericel tine is 
@reeter then the cosputer cest end eelays are iong, 
Conjecturee tnet an intereceive syste® vould result in 
Higher coeputer costs ang Snorter gelays Vitn the sane 
eeount of Clerical tire, 


Cly*eie, PL... Jr, Cesputer eenreation ef traly 


Tepeatanle enapinations, Educational Tecnnojogy 18(6)s 


4, 


1S, 
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$3035, June 1975, 


Discusses the disedvanteges of standard item 
banking Bystems and presents an elgorithaic approec: to 
ite® generation, Questions stored as @ Keyonrase p20), 
stetenentpnrase pool, and Gistractor pool, Ite 
Constructed by joining one meaner of the keyphrase and 
statesentonrase pools and tnen constructing a List of 
alternatives from the distractor pool, Cen be used: for 
Bultiple cnoice, true/false, completion, and Batening 
itess, 


O’Reiily, Pobert Py, william BP, Gorth, and Paul Pinsky, 


Comsputereassisted test construction: an effort based 


On On evaluation setnodology, Educational  Tecnnology 
13€3)032034, Maren 1973, 


argues thet the current gtete of the CAT art is 
Drececyupied «ith the mechanics of test construction 
ratner then relevent evaluation Podels, Descrives tne 
Comprenersive acnievement Monitoring (CAM)-~BSystes, 
Tnis syste* is designed to help instructors pertor® the 
following functionss 

(1) Clessi¢ications agsign or eliminate students 
to Or from a treatsent due to scarce resources, 

(2) Summative eveluetion on prograsst select 
treatrent A over treatment 8 on effect or etticiency, 

(3) Formative evaluation On progremes redesign 
component A of Treatment B to meet specifications, 

(4) Instructional manegesenti: piece student in 
Copponent A of program By repeat student in Ap etc, 

(5) Curriculum velidation: resove objective C 
from prograss A end B et level 1; plece at level 2, 

Syste® currently USes Paper@peted Opjectives end 
test itee p&mxs, Coaputer seneduies tests, indexes 
current ovjectives and item panks, finds item nuapers, 
and constructs foreg congisting of item numbers, 
(sSittler to ETS uveege,) wo iseeaiete plans tor 
exzpendced Computer use, 


Prosser, Franklin, Repeatarle tests, Educetional 
Technology 13(3)13493S, Meten 1973, 


A gORTRANebased systen in yse at Indiene 
University, Itea pools aVYatiaple in Englisn. 
geoorerny, nore econosics, cnesigtry, economics, 
etetistics, psychology, speech .therapys accounting, and 
educetion, Syste® provides sany forms of the seve test 


A 


16, 


37, 


38, 
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printed individoelly on e line printer, 28 seconds oft 
CPy time required to generate 1°80 3Jepage tests, with 3 
nours required to print tnem, Cost per test is 
approximately § cents, 


Resondini, David J, Test Item Systemt a metnod of 
Computer assisted test asseabdly, Educations) 
Technology 13(3)835937, Marcn 1973, 


A typical patcn CAT syste™, Steps followed in 
preparings administeringe and analyzing a test are! 

(1) - Question pelection (currently manuel) 

(2) test form and enswer cards prepered oy 
computer and printed on line printer 

(3) editing, e,9,+ eddition of graphic material , 

(4) duplication by Xerox and Tnersoftax processing 

(5) teat administration 

(6) test seoring by Mark sense computer cards 

(7) record updating by computer onto cards and 
item analysis listing printed 

Questions Classitied by CvuEBS, categories 
(Commision of Undergraduate Education in Blologicel 
Sciences) : 


Reagndini. David Jer and gonn” €E, Miller, A 
computerized syster fOr preparation of tests in 
ecadetic disciplines, Proceeaings of @ conference on 
computers in the undergradvate = = curricular 
PP, 7,2407,30, The University of jowa, lowe City, 


Towa, June 1970, 


v 


pescription of @ FORTRAN@pased system built for 
tne IBM. 1130, This system {8 essentially the Same aS 
the one descrined in Remondini’s 1973 erticle, Control 
card for each guestion includes biology subject area, 
Organizetion leval, penhavioral obfective level (re 
Bloom), and difticulty index, 


Salisnjecks Julian, Computer aided test preparation: 


six years of experience, Educational Technology 
£3¢3)037038, Maren 1973, a 


AGS 


39, 


40, 


41, 
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Another typical system almost identical to 
Remondini’s, Questions stored on disk and manually 
selected for inclusion on @ test, Multiple choice, 
true/talse, short answer, £4111 ‘ine aatening, and 
"ennanced” (diagram-oriented) items allowed, Oiscusses 
costs, solving of -cneating problem, and reduction of 
"edge" of fraternity test files, 


Scnonvperger, Richard J, Modular instruction with 


computer asserbied repeatable exams second 


generation, Educetjonal Technology 1§(2)136¢36, 
February 1975, 


Recommends the following principles for achieving 


guccess witn CATs 


(1) Make & good test item pool, 

(a) Use a large number of items, 1,@,, a 18 
to 1 ratio of items to presentations, (This is also 
recommended by Prosser.) Avoids students relying on 
co*ops in lipraries and dorms, 

(b) Force yourself to test tor concepts by 
Giving open book exams, 

Ce) Construct the item pank yourself, i,e,+ 
do not deligate the responsidility to graduate 
students, 

(2) Provide fast reinforcement, 
(3) Provide flexinility by letting students 
retake exams and use @ Contract @pproach to grading, 


Sivertson, Sigurd E,, Richard He Hansen, and Adeline QO, 
Scnoenenberger, Computerized test bank for clinical 
medicine, Educational Technology 13(3)838939, Maren 
1975, 

System designed "to identity the continuing 
education needs of physicians”, Tests constructed to 
matcn physicians’ specialties and backgrounds, Yields 


Telative Scores on different parts o¢ the test to snow 
areas in which pnhysiciaen’s Knowledge js deficient, 


Stodola, Quentin, Use Of computer asgempled tests in 


the California State University end College System, 
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Paucational Technology 13(3)14@941, Maren 1973, 


Comments: "Computer assisted test assembly has 
sometimes peen called °’a poor man’s CAI®, Computer 
Qssisted test assembly provides some of the advantages 
of CAL, such as drills student selfepacing and 
reference to study aids, but without the hign cost of 
tersinals for individual students and without the need 
for writing nighly sophisticated and complex learning 
programs, whicn, incidentally, thus far nave generally 
not been written, «. . The computer operation of 
assempling and scoring tests works satisfactorily, The 
problem is now to create a sufficient numper of useful 
question banks and to orient instructors to their use," 

Mentions «xact areas in which questions have been 
developed for use in the CSU System, project received 
336,000 to fund tne categorizations editing, and 
keypunching of 9000 items originally collect by ETS, 
Additional $26,000 was used for development of a 
preecalculus bank, 


Toggenburger,. Frank, Classroom teacher support system, 


Educational Technology 13(3)142"43, March 1973, 


Desciption of a system in use by the Los angeles 
Unified’ semool District, Currently in use with a U,5, 
History item bank, 2@ teachers developed the TT 
items currently available over the period of one 
summer, "Exercises" are requested by filling out an. 
Optical scan form, The Generated exercises are atored 
for larer moditicarion and Srydent response checking, 


Vgecxers, F,D, Creative test generators, Educetiona) 
Teennology 13(3)843944, Marcn 1973, - 


only article in this special issue of Rducatione: 
Technolo magazine that describes test generation 
without an icem pank,. Used to generate test® for a 
FORTRAN course, T¥o examples, tirst very simple and 
provides excellent demonstration of creative test 
generation, second example written in gkOBOL and 
includes a text formatting capability, Very 
intormative examples, 


